Incremental decision tree

Most decision tree methods take a complete data set and build a tree using that data. This tree cannot be changed if new data is acquired later.

Incremental decision trees are built using methods that allow an existing tree to be updated or revised using new, individual data instances. This is useful in several situations: a) the entire dataset is not available at the time the original tree is built, b) the original data set is too large to process, or c) the characteristics of the data change over time.

1 Applications
2 Methods
3 See also
4 References
5 External links

Applications

On-line learning
Data streams
Concept drift
Data which can be modeled well using a hierarchical model.
Systems where a user-interpretable output is desired.

Methods

Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.

CART family

CART ^[1] (1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963)^[2]

incremental CART (1989) ^[3] Crawford modified CART to incorporate data incrementally.

ID3/C4.5 family

ID3 (1986)^[4] and C4.5 (1993)^[5] were developed by Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966)^[6] The ID3 family of tree inducers was developed in the engineering and computer science communities.

ID3' (1986) ^[7] was suggested by Schlimmer and Fisher. It was a brute-force method to make ID3 incremental; after each new data instance is acquired, an entirely new tree is induced using ID3.
ID4 (1986)^[7] could incorporate data incrementally. However, certain concepts were unlearnable, because ID4 discards subtrees when a new test is chosen for a node.
ID5 (1988) ^[8] didn't discard subtrees, but also did not guarantee that it would produce the same tree as ID3.
ID5R (1989)^[9] output the same tree as ID3 for a dataset regardless of the incremental training order. This was accomplished by recursively updating the tree's subnodes. It did not handle numeric variables, multiclass classification tasks, or missing values.
ID6MDL (2007)^[10] an extended version of the ID3 or ID5R algorithms.
ITI (1997)^[11] is an efficient method for incrementally inducing decision trees. The same tree is produced for a dataset regardless of the data's presentation order, or whether the tree is induced incrementally or non incrementally (batch mode). It can accommodate numeric variables, multiclass tasks, and missing values. Code is available on the web. [1]

note: ID6NB (2009)^[12] is not incremental.

STAGGER

Schlimmer and Granger's STAGGER (1986) ^[13] was an early incremental learning system. It was developed to examine concepts that changed over time (concept drift).

VFDT

Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.

VFDT (2000) ^[14]
CVFDT (2001) ^[15] can adapt to concept drift, by using a sliding window on incoming data. Old data outside the window is forgotten.
VFDTc (2006) ^[16] extends VFDT for continuous data, concept drift, and application of Naive Bayes classifiers in the leaves.
VFML (2003) is a toolkit and available on the web. [2]. It was developed by the creators of VFDT and CVFDT.

OLIN and IFN

OLIN (2002) ^[17]
IOLIN (2008) ^[18] - based on Info-Fuzzy Network (IFN)^[19]

References

^ Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984) Classification and regression trees. Belmont, CA: Wadsworth International Group.
^ Morgan, J. N, & Sondquist, J. A. (1963) Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc., 58, 415-434.
^ Crawford, S. L. (1989) Extensions to the CART algorithm. International journal of man-machine studies. 31, 197-217.
^ Quinlan, J. R. (1986) Induction of Decision Trees. Machine Learning 1(1), 81-106.
^ Quinlan, J. R. (1993) C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
^ Hunt, E. B., Marin, J., & Stone, P. J. (1966) Experiments in induction. New York: Academic Press.
^ ^a ^b Schlimmer, J. C., & Fisher, D. (1986) A case study of incremental concept induction. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 496-501). Philadelphia, PA: Morgan Kaufmann.
^ Utgoff, P. (1988) ID5: An incremental ID3. Fifth International Conference on Machine Learning, pp. 107-120. Morgan Kaufmann Publishers.
^ Utgoff, P. E. (1989) Incremental induction of decision trees. Machine Learning 4, 161-186.
^ Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.
^ Utgoff, P. E., Berkman, N. C., & Clouse, J. A. (1997) Decision tree induction based on efficient tree restructuring. Machine Learning 29, 5-44.
^ Appavu, S., & Rajaram, R. (2009) Knowledge-based system for text classification using ID6NB algorithm. Knowledge-based systems 22 1-7.
^ Schlimmer, J. C., &Granger, R. H., Jr. (1986). Incremental learning from noisy data. Machine Learning 1, 317-354.
^ Domingos, P., Hulten, G. (2000) Mining high-speed data streams. Proceedings KDD 2000, ACM Press, New York, NY, USA, pp. 71–80.
^ Hulten, G.,Spencer, L.,Domingos, P. (2001) Mining time-changing data streams. Proceedings KDD 2001, ACM Press, New York, NY, pp. 97–106.
^ Gama, J., Fernandes, R., & Rocha, R. (2006) Decision trees for mining data streams. Intelligent Data Analysis 10 23-45.
^ Last, M. (2002) Online classification of nonstationary data streams, Intell. Data Anal. 6(2) 129–147.
^ Cohen, L., Avrahami, G., Last, M., Kandel, A. (2008) Info-fuzzy algorithms for mining dynamic data streams. Applied soft computing. 8 1283-1294.
^ Maimon, O., Last, M. (2000) The info-fuzzy network (IFN) methodology. Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers

External links

ITI code. http://www-lrn.cs.umass.edu/iti/index.html
VFML code. http://www.cs.washington.edu/dm/vfml/